{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bf408a74",
   "metadata": {},
   "source": [
    "## 9.2.1 文本序列数据"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c8a448c",
   "metadata": {},
   "source": [
    "上一节我们学习了序列数据的概念，并用一个时间序列模型演示了序列建模方法。与时间序列数据不同，文本序列数据是一类有序的文本数据，每个数据项都是一个字符串，并且这些字符串是有序的。文本序列数据常常被用来表示自然语言文本，例如句子或者段落。例如，一段英文文本可以表示为一个文本序列数据，其中每个数据项都是一个单词。\n",
    "\n",
    "文本序列数据常常被用来做自然语言处理任务，例如文本分类、机器翻译、文本生成等。在这些任务中，常常需要使用深度学习模型来处理文本序列数据，并使用文本序列数据来训练这些模型。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d4cbf8c",
   "metadata": {},
   "source": [
    "## 9.2.2 文本预处理"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "934e4c07",
   "metadata": {},
   "source": [
    "文本预处理是指在进行自然语言处理任务之前对文本进行的一系列处理步骤。目的是使文本变得更适合进行自然语言处理任务，并且可以提高处理效率。\n",
    "\n",
    "文本预处理的步骤通常包括：\n",
    "\n",
    "1. 去除文本中的噪声，例如标点符号、HTML标签、空格等，或者停用词。停用词是指在文本中出现频率较高，但没有实际意义的词，例如“的”，“了”等。\n",
    "\n",
    "2. 将字符串拆分为词元（如单词和字符）。\n",
    "\n",
    "3. 建立词表，将拆分的词元映射到数字索引。\n",
    "\n",
    "4. 将文本转换为数字索引序列，方便模型操作。\n",
    "\n",
    "5. 编码文本，例如使用one-hot编码将单词转换成数字向量。\n",
    "\n",
    "进行文本预处理的目的是使文本更适合进行自然语言处理任务，一方面可以把文本数据转化成计算机能理解的数字和向量，另一方面，可以减少噪声，提高模型的训练效率、提升模型的泛化能力等。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8e30ace",
   "metadata": {},
   "source": [
    "## 9.2.3 数据读取"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5eb3907",
   "metadata": {},
   "source": [
    "我们每天都会阅读并亲手创造大量的文本数据，比如看网站的新闻，和朋友互发消息，写工作报告等等，包括你现在看的这篇教程。相对新闻网站和聊天数据，小说是一种更为优质的文本数据。我们以金庸的短篇小说《越女剑》为例，看一下文本预处理的每一步的操作。\n",
    "\n",
    "首先是数据的读取，方法非常简答。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "577c64bb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "191\n",
      "['越女剑\\n', '“请！”“请！”\\n', '两名剑士各自倒转剑尖，右手握剑柄，左手搭于右手手背，躬身行礼。\\n', '两人身子尚未站直，突然间白光闪动，跟着铮的一声响，双剑相交，两人各退一步。旁观众人都是“咦”的一声轻呼。\\n', '青衣剑士连劈三剑，锦衫剑士一一格开。青衣剑士一声吒喝，长剑从左上角直划而下，势劲力急。锦衫剑士身手矫捷，向后跃开，避过了这剑。他左足刚着地，身子跟着弹起，刷刷两剑，向对手攻去。青衣剑士凝里不动，嘴角边微微冷笑，长剑轻摆，挡开来剑。\\n', '锦衫剑士突然发足疾奔，绕着青衣剑士的溜溜的转动，脚下越来越快。青衣剑士凝视敌手长剑剑尖，敌剑一动，便挥剑击落。锦衫剑士忽而左转，忽而右转，身法变幻不定。青衣剑士给他转得微感晕眩，喝道：“你是比剑，还是逃命？”刷刷两剑，直削过去。但锦衫剑士奔转甚急，剑到之时，人已离开，敌剑剑锋总是和他身子差了尺许。\\n', '青衣剑士回剑侧身，右退微蹲，锦衫剑士看出破绽，挺剑向他左肩疾刺。不料青衣剑士这一蹲乃是诱招，长剑突然圈转，直取敌人咽喉，势道劲急无轮。锦衫剑士大骇之下，长剑脱手，向敌人心窝激射过去。这是无可奈何同归于尽的打法，敌人若是继续进击，心窝必定中剑。当此情形，对方自须收剑挡格，自己便可摆脱这无可挽救的绝境。\\n', '不料青衣剑士竟不挡架闪避，手腕抖动，噗的一声，剑尖刺入了锦衫剑士的咽喉。跟着当的一响，掷来的长剑刺中了他胸膛，长剑落地。青衣剑士嘿嘿一笑，收剑退立，原来他衣内胸口藏着一面护心铁镜，剑尖虽是刺中，却是丝毫无伤。那锦衫剑士喉头鲜血激喷，身子在地下不住扭曲。当下便有从者过来抬开尸首，抹去地下血迹。\\n', '青衣剑士还剑入鞘，跨前两步，躬身向北首高坐于锦披大椅中的一位王者行礼。\\n', '那王者身披锦袍，形貌拙异，头颈甚长，嘴尖如鸟，微微一笑，嘶声道：“壮士剑法津妙，赐金十斤。”青衣剑士右膝跪下，躬身说道：“谢赏！”那王者左手一挥，他右首一名高高瘦瘦、四十来岁的官员喝道：“吴越剑士，二次比试！”\\n']\n"
     ]
    }
   ],
   "source": [
    "with open('data/越女剑.txt', 'r') as f:\n",
    "    lines = f.readlines()\n",
    "print(len(lines))\n",
    "print(lines[:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e27b662e",
   "metadata": {},
   "source": [
    "我们通过打印，可以看到《越女剑》这篇小说确实不长，只有191行文字，打印前10行，可以看到小说的内容。里面有很多标点，换行等符号。在某些任务中，这些符号自然有其存在的意义，但更多时候，这些符号可以作为停用词去掉。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a771dc6",
   "metadata": {},
   "source": [
    "## 9.2.4 去除文本噪声"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7994f780",
   "metadata": {},
   "source": [
    "去除文本噪声是指在处理文本数据时，删除文本中不相关或者无用的信息。去除文本噪声的目的是使得文本中有意义的信息更加突出，从而提高文本分析和处理的效率。\n",
    "\n",
    "前面讲到的去停用词在传统的自然语言处理任务中非常常见，因为一些词语在文本中出现频率过高或者并不具有实际意义，所以被认为是无用信息的词，所以需要提前去掉。停用词包括英文中的介词、代词、连词等，中文中的助词、量词、叹词等。\n",
    "\n",
    "在深度学习中，是否需要去除停用词，取决于具体的应用场景和需求。\n",
    "\n",
    "在一些情况下，去除停用词可能会有帮助。例如，当文本数据中停用词的出现频率很高，且并没有太多意义时，去除停用词可以减少计算量，并使得文本中有意义的信息更加突出。\n",
    "\n",
    "但是，在另一些情况下，去除停用词可能会带来负面影响。例如，在文本分类任务中，停用词可能包含与文本类别相关的信息，如“不”在负面评价中的出现。如果去除了停用词，可能会影响模型的准确性。\n",
    "\n",
    "这里，我们用的金庸小说内容质量比较高，所以不需要进行去停用词操作。我们只去除掉其中的标点符号，由于都是中文标点，方便起见，这里我们引入了zhon这个包，大家运行之前需要执行  **pip install zhon**  来安装一下这个包。另外，大家应该也注意到了，每一行结尾都有一个换行符“\\n”，我们也把它去掉。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "95a30c83",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['越女剑', '请请', '两名剑士各自倒转剑尖右手握剑柄左手搭于右手手背躬身行礼', '两人身子尚未站直突然间白光闪动跟着铮的一声响双剑相交两人各退一步旁观众人都是咦的一声轻呼', '青衣剑士连劈三剑锦衫剑士一一格开青衣剑士一声吒喝长剑从左上角直划而下势劲力急锦衫剑士身手矫捷向后跃开避过了这剑他左足刚着地身子跟着弹起刷刷两剑向对手攻去青衣剑士凝里不动嘴角边微微冷笑长剑轻摆挡开来剑', '锦衫剑士突然发足疾奔绕着青衣剑士的溜溜的转动脚下越来越快青衣剑士凝视敌手长剑剑尖敌剑一动便挥剑击落锦衫剑士忽而左转忽而右转身法变幻不定青衣剑士给他转得微感晕眩喝道你是比剑还是逃命刷刷两剑直削过去但锦衫剑士奔转甚急剑到之时人已离开敌剑剑锋总是和他身子差了尺许', '青衣剑士回剑侧身右退微蹲锦衫剑士看出破绽挺剑向他左肩疾刺不料青衣剑士这一蹲乃是诱招长剑突然圈转直取敌人咽喉势道劲急无轮锦衫剑士大骇之下长剑脱手向敌人心窝激射过去这是无可奈何同归于尽的打法敌人若是继续进击心窝必定中剑当此情形对方自须收剑挡格自己便可摆脱这无可挽救的绝境', '不料青衣剑士竟不挡架闪避手腕抖动噗的一声剑尖刺入了锦衫剑士的咽喉跟着当的一响掷来的长剑刺中了他胸膛长剑落地青衣剑士嘿嘿一笑收剑退立原来他衣内胸口藏着一面护心铁镜剑尖虽是刺中却是丝毫无伤那锦衫剑士喉头鲜血激喷身子在地下不住扭曲当下便有从者过来抬开尸首抹去地下血迹', '青衣剑士还剑入鞘跨前两步躬身向北首高坐于锦披大椅中的一位王者行礼', '那王者身披锦袍形貌拙异头颈甚长嘴尖如鸟微微一笑嘶声道壮士剑法津妙赐金十斤青衣剑士右膝跪下躬身说道谢赏那王者左手一挥他右首一名高高瘦瘦四十来岁的官员喝道吴越剑士二次比试']\n"
     ]
    }
   ],
   "source": [
    "import string\n",
    "from zhon.hanzi import punctuation\n",
    "\n",
    "exclude = set(punctuation)\n",
    "lines = [ ''.join(ch for ch in line if ch not in exclude).replace('\\n','') for line in lines]\n",
    "print(lines[:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90102006",
   "metadata": {},
   "source": [
    "## 9.2.5 词元化"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5243bf8d",
   "metadata": {},
   "source": [
    "可以看到，虽然我们去除了一些噪声，计算机依然没有办法把这些文字直接接收作为模型的输入。这时就需要进行一个名为Tokenizations的操作了。\n",
    "\n",
    "关于术语Tokenizations的翻译，其实并不是很一致。有的叫做分词，有的叫做令牌化，还有的叫做标识化。感觉翻译的都不太到位，梗直哥这里倾向于叫它词元化。\n",
    "\n",
    "词元化的目标是把输入的文本流，切分成一个个子串，每个子串相对有完整的语义，便于学习embedding表达和后续模型的使用。其中对于英文的切分比较简单，基于空格和一些符号就可以了。中文的词元化一般有三种方法，将文本切分为字或词或词缀。\n",
    "\n",
    "这里为了简单化，我们就把文本切成字。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "3d80264c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['越', '女', '剑']\n",
      "['请', '请']\n",
      "['两', '名', '剑', '士', '各', '自', '倒', '转', '剑', '尖', '右', '手', '握', '剑', '柄', '左', '手', '搭', '于', '右', '手', '手', '背', '躬', '身', '行', '礼']\n",
      "['两', '人', '身', '子', '尚', '未', '站', '直', '突', '然', '间', '白', '光', '闪', '动', '跟', '着', '铮', '的', '一', '声', '响', '双', '剑', '相', '交', '两', '人', '各', '退', '一', '步', '旁', '观', '众', '人', '都', '是', '咦', '的', '一', '声', '轻', '呼']\n",
      "['青', '衣', '剑', '士', '连', '劈', '三', '剑', '锦', '衫', '剑', '士', '一', '一', '格', '开', '青', '衣', '剑', '士', '一', '声', '吒', '喝', '长', '剑', '从', '左', '上', '角', '直', '划', '而', '下', '势', '劲', '力', '急', '锦', '衫', '剑', '士', '身', '手', '矫', '捷', '向', '后', '跃', '开', '避', '过', '了', '这', '剑', '他', '左', '足', '刚', '着', '地', '身', '子', '跟', '着', '弹', '起', '刷', '刷', '两', '剑', '向', '对', '手', '攻', '去', '青', '衣', '剑', '士', '凝', '里', '不', '动', '嘴', '角', '边', '微', '微', '冷', '笑', '长', '剑', '轻', '摆', '挡', '开', '来', '剑']\n"
     ]
    }
   ],
   "source": [
    "tokens = [list(line) for line in lines ]\n",
    "for i in range(5):\n",
    "    print(tokens[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b25bddd",
   "metadata": {},
   "source": [
    "## 9.2.6 词表"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4233b7e9",
   "metadata": {},
   "source": [
    "可以看到，虽然我们拿到了词元列表，但它依然是一堆字符串，不是计算机可以计算的数字。此时就需要我们构建出一个词表来。\n",
    "\n",
    "词表是指在自然语言处理（NLP）中，将文本数据中出现的所有词汇组成的列表。词表可以帮助我们了解文本数据中词汇的分布情况，并为后续的文本处理建立基础。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "d547178c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import collections\n",
    "\n",
    "class Vocab:\n",
    "    def __init__(self, tokens=None, min_freq=0, reserved_tokens=None):\n",
    "        if tokens is None:\n",
    "            tokens = []\n",
    "        if reserved_tokens is None:\n",
    "            reserved_tokens = []\n",
    "        # 按出现频率排序\n",
    "        counter = count_corpus(tokens)\n",
    "        self._token_freqs = sorted(counter.items(), key=lambda x: x[1],\n",
    "                                   reverse=True)\n",
    "        # 未知词元的索引为0\n",
    "        self.idx_to_token = ['<unk>'] + reserved_tokens\n",
    "        self.token_to_idx = {token: idx\n",
    "                             for idx, token in enumerate(self.idx_to_token)}\n",
    "        for token, freq in self._token_freqs:\n",
    "            if freq < min_freq:\n",
    "                break\n",
    "            if token not in self.token_to_idx:\n",
    "                self.idx_to_token.append(token)\n",
    "                self.token_to_idx[token] = len(self.idx_to_token) - 1\n",
    "\n",
    "    def __len__(self):\n",
    "        return len(self.idx_to_token)\n",
    "\n",
    "    def __getitem__(self, tokens):\n",
    "        if not isinstance(tokens, (list, tuple)):\n",
    "            return self.token_to_idx.get(tokens, self.unk)\n",
    "        return [self.__getitem__(token) for token in tokens]\n",
    "\n",
    "    def to_tokens(self, indices):\n",
    "        if not isinstance(indices, (list, tuple)):\n",
    "            return self.idx_to_token[indices]\n",
    "        return [self.idx_to_token[index] for index in indices]\n",
    "\n",
    "    @property\n",
    "    def unk(self):  # 未知词元的索引为0\n",
    "        return 0\n",
    "\n",
    "    @property\n",
    "    def token_freqs(self):\n",
    "        return self._token_freqs\n",
    "\n",
    "def count_corpus(tokens):\n",
    "    # 这里的tokens是1D列表或2D列表\n",
    "    if len(tokens) == 0 or isinstance(tokens[0], list):\n",
    "        # 将词元列表展平成一个列表\n",
    "        tokens = [token for line in tokens for token in line]\n",
    "    return collections.Counter(tokens)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "9859af02",
   "metadata": {},
   "outputs": [],
   "source": [
    "vocab = Vocab(tokens)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "bd19d1ab",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('<unk>', 0), ('剑', 1), ('的', 2), ('一', 3), ('不', 4), ('道', 5), ('士', 6), ('是', 7), ('了', 8), ('人', 9)]\n"
     ]
    }
   ],
   "source": [
    "print(list(vocab.token_to_idx.items())[:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6237bb3",
   "metadata": {},
   "source": [
    "## 9.2.7 词嵌入"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc5d91d8",
   "metadata": {},
   "source": [
    "除了这种这种词元索引，深度学习的输入有一种更好的形式——词嵌入。词嵌入（word embedding）是指将单词映射成一个实数向量的过程。通常来说，词嵌入会把每个单词映射成一个低维度的实数向量，这个向量可以用来表示单词的语义。\n",
    "\n",
    "词嵌入的好处在于它能够把单词转换成数值，这样我们就可以使用数值计算来处理单词之间的关系。例如，我们可以使用词嵌入计算两个单词之间的相似度，也可以使用词嵌入计算单词之间的距离。这些计算可以帮助我们解决自然语言处理中的许多问题。\n",
    "\n",
    "通常来说，词嵌入是使用神经网络学习得到的。我们可以训练一个神经网络来预测一个单词的上下文，然后使用这个神经网络的隐藏层来表示单词的语义。这里我们来讲一种最基础的词嵌入方法——独热编码。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3044ead9",
   "metadata": {},
   "source": [
    "### 9.2.7.1独热编码"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45ebd680",
   "metadata": {},
   "source": [
    "独热编码是一种常用的特征工程方法，它可以帮助我们把离散特征转换成数值从而输入模型。举个简单的例子，对于这么一句话“猫吃鱼”，我们可以用三个简单的向量来表示句子中的每个词。"
   ]
  },
  {
   "attachments": {
    "image.png": {
     "image/png": "iVBORw0KGgoAAAANSUhEUgAAATIAAAElCAYAAABnFW7mAAAgAElEQVR4nO3dfXhU5Z038K8PpserOqjQCdh0EDqS4WUTiWAamW7iVCGNoA2URqVQsICr+NK4jxEDu9vsPkaruCu4VnadYEsJGPCNFiVOiomkJEBAYsAAExKIgdQkY0ASKUmmXPfzx5x5STIzGWBeck++n+ua6zpk7pnzy5k737nPOfc5XCWEECAiktj/iXQBRERXikFGRNJjkBGR9BhkRCQ9BhkRSY9BRkTSY5ARkfQYZEQkPQYZEUmPQUZE0mOQEZH0GGREJD0GGRFJj0FGRNJjkBGR9BhkRCQ9BhkRSY9BRkTSY5ARkfQYZEQkPQYZEUmPQUZE0mOQEZH0GGREJD0G2aBQifwJEzBhwgRMyK+MdDEUYZX5al+YMB9FjZGuBkDXSXz0uxIMhlJ8YZANCt34ymqF1WqF9avuSBdDEdb9ldoXrE345mIkK7Gj7S/P447Y7yNjfRMiWsoAro50AUQ0WO3GC6mrsDfSZQSAIzIikh6DjIikxyAjIukNuiCreS1VPWMzD4UNPhq1/hH/5DzLl27GER/NOkpy1PdKxWs1/Z+3dx7HJwX5eGzuD9R2E5C+OB8Flhr8tct3jY1F89X2+agEcObo+3hefY8fzF2JAkst2ux91tVWC0tBPhanq3X/YC4eW/s+avs2JP+6/oqa99fi2cXprs/sB3Mfw9rN+3C808+2rMzvcybQjs7jn6AgfzHS+7zPST+fvYu9E8f3bcbaZ92vn5C+GPkFFtT46zyXw96J458UIN/1O/8Acx9bi837TiKwUi+1nzvPoi/EeuePDqx0/Z7zB8Wp1D7EINNTuUpoAQFAZBSe9trm3PaFAmobwCjM9V7fSZRmK4422jyxr9dTraI8P0VoXO/h5aGZLJ7c3iR6vLxzvdmotssW75VmC13f12pXiUrXC9vFQfN8oVd8rEfRi/lF68Ry57+zS69k80WxC6K20M92BAQUrbhv7UHR7u3lpdnu/mJtEtsf1QvF52efIvIrvb6Lo5LaQjFfr/iuAxqRkl8uWr11ngCUZrv79rq/bBeP+lmXJiVf+Cz1svt5qcj28xqj9z+4iBp0QSZ6SkW2s7NmFokWL0125fT+YDOLvLTyeB/tqkr3B9VjFW+ma3p9kHNyzeJ9i0VY3jeL3DmTPT54RZjM1n5h5g4yRSiKI4xm528SFsv7wvxEmjC9XutcmbCaTe4/GEUvZjvXtWmNWJSm7f/HxCDzol2U5ngEj6IVaYvWiE0Wi7BYNok1i9KE1iPgdNml/cPMFWQGYTLpBAChaNPEojWbhMViEZZN+WLOZI9+oSwU270ERHtlnkhS3P1Dm7ZIrNlkERaLRWxas0ikaRX/dQTAHWRq/wKEJqX3elI07t9XySwS/b7yr6ifnxbVFouwWF4Us5xtDMvFHyyO9e+uO3cZv1VoDb4gE+fEjiXOzrBQbO+3zWrEakPvP34lu7T/yGlfnjqy04pV7uGRRwhBKKbV4qCXnnahtlBk6ZzvrxM5u3oX4fkegFGss/r46q1dKxKc7XRLxPamvu0uiNrCrN4jOgZZP+dKs12jdO/bUYie1nKRl6S4/zDf7DNqcAWZGjJLtov+b9MuSrN1vkce53aJHI9+scTbiL2nSWxfonPVkVnkfa/CH3eQqet574S40K/UUpGtc/fBvqUGo5/3GpkZzWLwjcPcBmGQee46KiK7tE9XqTcLo/rhLVxoEF53HYUQ+/K06jdrtnC9xbkdYonz21S7XJT4+brsqVntDqGMwl7feL06ibcQdbyDqFyldXWSvH2+9jPOidJsrWCQ+VIr1iYE8KUhhBDt28VC1+fruXsvegdZ3+c8ufpX/8+idm2CO+TW9R+pu/TsE3nOgPDSNwfiGWS99ib6leruh71KDVI/lynIBt3BfgAYfvssZAIAurGpvLrXc60HPkQFABgy8fTS+6AAgO3PONTrxMAhlL9lAwAoj96HH8Y4fmr/tASF6sR5wzP/hBkjfNcQk7gQqzLVfxSvR4mP45sZKZMQ4/WZapS8YVNX9iTmJHtvBQyH6aFnYPBdytDWUImth9Xlhc9ifryv7QhgxGw8nqt1LNtexvt7vDdTFs/EHb7eRq/H7c7lmkaPy3IaUOkuBM/Oj/fxuQOIScacJ9VP1PZbFFf5Ltk/BYtn3uFzPXq9q1Lsb3D/AYSinw92gzLIMOpOPKBuXNtb5TjkesKOI3uLAQDaB1ORaJiKDABABT480Op+/aESFFgBQMGCmVNdHaGpYT8cn6+C+5ITByoC02YZ1eUyHDrpvdXY2FHen2isxS41x5AxFX7XlpiM+5QByhmqmmodX1wATKZ/wPABmien/lxd6kZlnffT3tPi4wJbd89F92U5HUdR7izEmIqJAxSSODVDXbJhV+3lpsM0BFqqp1D088FucAYZRuGH89SOYN2GKld/rEb5JsdH9OOpE4BR0+D8DLaVH4Lz5HtD1TY4cmwBfmp097imWmdPDKyDjLn5VtdyY1urlxZGTB7j48VtTfjc2cpnI6c4xE8buJ6hqKFhv2v51rFjB35BnAHJ6mJdc3vwCrG14IRzuWIZbrnqKlzl7/GjNa6Xft7UFrw6AhD8fj74DdIgA+JS5sKRUR6jrUPlcOwxZmLW7cMB6JGcqQ7h366EYye0AWW/Vz/IrEwYB/oK9yPm6m9d/ovPfw3bwK0o2IZd7XuXL0JsX5+PdAl+XVE/HyQG70Xj+unISgAqDgPbPtyPjvtn48ynOxwjLeMsTFP36BKT74OC1ei27cDBul8j+foD+FDNsYVZqb12RUbGGQBYAXyFQPpWY6N7Fq3PXUhfxkyGEUAFnCMDvZ/GF/F3zov1Sjv6+4C6c9l+vgMYaOeyuQ4H1MX4uJGhKWrZ+/jyP1ICb6/cEJo6fAhrPx8kBu2IDJiEO3+hjra2foxP7R34vKwMAGDITHbHwpRUZAEAqmCpbkXrJ0XYBsA9anMbpZuoLllRcXSgIbQdJw857w1mwCX/TYy5Gc4Bu+2zk/C7NvsXOHzZB4Sj2/BYPdTD9/jo02MDtm+oq4TzRkjfH6312/aS6OMx3Xkc88hXOD96NEYH+rjxmuDVEYCw9vNBYhAHGZCY+qCjE3dvQvmBT/HxVgBQkDF1orvR8NsxSz0xsK28EpUfOmIMmQ/gzj5fLqOmzYLzsOa2DTvR7G/lHTuxaZ36J6GdB+NEf429iElEqvNs0LYN2OlnZR07t7gvBaHeklLxczVAbL/9E/b4HbkewYevOo8PZSA96QqOK/QzBaYFaiEVr+JDX9fFqZrffQgT0hfj2WefxcaDfwtiHQMLaz8fLCI9/8O/fSJP65jHYjAa1UmR/SfJ1qx2zifTuiZOep3tL9rF9oUekya9zNoXQgjR0ypKPCZGJqyu6dXOPX/H1+VRDp4TORWTWXidAtVrYiPnkfXXI2pWu+dv+Z4tf0HUrnNfRaEs3N67ncc8Mv+X2PieO+V5+ZxiWidq+81SVbWXiOVa52eaIXxcaee7Ao9LlPyX6ut3Ck4/F2KXyFHkmEc2yIPMc1Kp+vB22ZJrFn8Anaf+TWHyuMQkKec98VlTh+MD7OkQbUeLRW6Kx6UdujzRdy5roEHWd6a4JiVXFB8945il3dMhmvaY+1+zxyDrr9eMeghd1lpR5tyO4oI4c3KPMM/3vIQpU/SbUB+EIOv7eSr6+WJt2VFxRg20no42cbQ4t9flQ/3DYWBXHmQiKP1ciHphNron1m5v9pXckTfIg0yIntLsXtcjeu2EPSXui64D+Pa4UGsW6R6dzdfD1wW5gQeZEKLHKoqydH7Xo5iyxfJkBpk/Pa0lIsfvhdrucCnyNvQNSpAJxyVI/i44dz0Uoc8puawLx4MSZOLK+7nXgQQgtHmXeq1C6A3qY2QAEDN1Jha4JotqMSPRy9k/z+NRAIyLTX7PEV4zaSk+ajuB4vw5SNL2nYmqQJs0B/nFJ9C2ZyXu8DMrOiAx8bh/Sx1OFOdjzmRN7+c0kzEnvwTHLY8jabDNGRhkYmJn4KWjTdhf+ARmxmn6Pa+Jm4knCvej6egm3O9v9v8VF6LD7NePoml/IZ6YGQcvlSBu5hMo3N+Eoy/NQGwEP9cr7+cxuOOfLTDP18Pz1QOevIqAq4QQItJF+GdHp60d5y8CwDBcO1ILjZfO0XW2BV+rxyyVG0bjUk4U2TttaD9/0e/7X0l7r68ddi1GajXqnCeP31G5IexnuaTUdRYt6gce0Oft0X7YtSOh9fmhdeFsy9eOM5+9PiOfb+xuDwU3jL4RV/rpufvyAP0r4N/JISj91rGyALZLeEkQZERE/g36XUsiooEwyIhIegwyIpIeg4yIpMcgIyLpMciISHoMMiKSHoOMiKTHICMi6THIiEh6DDIikh6DjIikxyAjIukxyIhIegwyIpIeg4yIpMcgIyLpMciISHoMMiKSHoOMiKTHICMi6THIiEh6DDIikh6DjIikxyAjIukxyIhIegwyIpIeg4yIpMcgIyLpMciISHoMMiKSHoOMiKTHICMi6THIiEh6DDIikh6DjIikxyAjIukxyIhIegwyIpIeg4yIpMcgIyLpMci8aSzC/AkTMGHCfBQ1em9SmT8BEyZMwIT8yvDWRkT9RHmQdeHk+8tx209+hzp73+fs6Ozs8v6yi9+gyWqF1dqEby56b9L9lRVWqxXWr7qDWTARXYboDrLGd/Dog+tQ/adf4u5nynDG87mGDcgYHos7nv8L2vqFHBHJJLqDbOwCbD9khkkBTq3JwLyCOjgyqxlbnn4cFehE9f6/wh4T4TopPLrOoqWlBbbOyHxzdZ1tQUtLC1pazsLHvkBY2Dttah02RGhTBN1VQggR6SJCrXnLHOgf2IZuJGB1zad48GiW49+KCW/WvgV98SvYcdrjBWer8OYbZbBBC9PDv0Tyjf3fs37Hi3j3MICEn2LFPbf0fnLaQvxm3uRQ/kp0yc6g7Kkp+NGaUzCa67F7qT48q7W34cDW5/D0U29gl83jMISiRdrDr+Dlf8nCtNhwfJN24eRH/4X/+/Rv8H5tp8fPNZg851m8/J//jB+PuyYMdYSIGBLaRclyrQAgkJAkkhQIQBGmN+uFEPXCbITjuWA9sksj/QtTLxdE7TqTUNTPx2iuD89qe6zizXSN/76imMS62gshLqRdlOboXb+/94dOZJe0ip4QVxIqQ2JEBgBo3oI5+gewTf1SVBZux1//MBsj0Iry1zgii1r2U/jgVybMW9cA53goPCMy9wgQUKCf/xoKVy9Aynevgb3zFD7d8jSylm3FKQDQLkfJsd9ixohQ1GFHXUE6EpeVoRuAJiUXW/+Qi7vGaxDTdRbHPnkB8zNXo7obAIxYZy3DI/ESHmuJdJKGTU+teDXZ+e0zR2xt9dO23iyMgACMwteXd2k2R1+DW49o3W8W8/VKv9FHOEZkPfvyhE5dn3Z5iWj30qa9NNvVRpe3LzSjodNFIlNRf3fjOmH1spIeq1mY1DZKZpE4HYo6Qiy6D/Y7dR3B/6TfhiernD/YAcunHZGsiEKo6+RHeP7eiRhz+zJsbnCMwxSNBkrYKujAzjdecIy2kIFXVs6At8HWCNO/4vWFjqpOvfAGdoagSx55N1/dC9Fi1eol8DbYiolfhDXPJQAAurfl490jwa8j1KI/yM7swb9Pvw2PlnUDiglZmVoA3Vj/VjkYZdFpz6sZWPWBc1dSgX5+IQ7u/S9MC1cB9k9RUqjuyBrnIiXOV8MRSM3Kcix2F+LdimD3yAZUbj3sWFR+jrum+dpljEHizF/AAAA4jIKd8iVZVAfZ344U4v4p05FX3Q0oScgrewdbch+DFgA2bkW5r36jTcOLFgsslheRpvXeJH6RBRaLBZZF8SGqnq6UZvJimPc34eimn2NS+IZjwEkrKtUcM2Qmw9/RuOETU2EEAHTjT1XHgluH/QSqK9TlrLsw1d+hr8SpyFAXD5cdRmtwKwm5qyNdQCh9e9I9eOSnOmxdPwlrd23Gk0kjAHsqfq4Aa7q3ovyzP2B2qqNta/lreKXXEX9V6fYB1lKK0iLn8vdwz1OPI3VUEH8JumTKxCdgLnsKC+4ch4hMKGi2wnkUY6JugM6g1+N2ABUAbA1t6AAwPFh1NH2BGnVRq48d4H3HYLJRLeToKbQCkKkbR3WQASNgeukAmnNvxHedc3VipuKuLGDNxm7sOlIHpDpGVN/UFeHFFyv8vFcgjLhlGYMs0qY//CqmR3D9DQ37XctjYy+hM5xogQ3BDLJaOHt0fNzIwF9nbUZ7sGoIkygPMgAxsfhurPenqg5/ATvi0WvEPetFWJ6c0qdlK/745C/wuhWY9aIF/Z/+I578xeuwBq9qGjI8RkIRpYfeOTSUUPQHWR9nyn6N5RvVf9R8gSag9zGM8bdj5kxTn1c1oOk7AKzA+Ntnov/TTVCfJqIIiOqD/f00b8GSjDXqaXEAB+rQHMl6iCgohk6QnSnDM2mLsK0b0JlMjlPN3efRHSUXzZKs2tFcF+kaAKAVbT7uvSeDoRFkXUfwP/MysLqhG0hYjQ925OIuAEANvmiKcG0Uda7T+Jiz49V5fG0LUSHX3oDAK/kGnaGqIwyi/xjZmWq8PO8O5JR1A7osFL3zKyResxvf8tX++H6UlPQdprWi+ivn0yXo/3Q1vgpu1SSxUTeNhwKgG8D+hgbA5GcmWWsbXAOh2/V+55xdstgx+AcAZQAqavsdDe7DY2RonIwxwawjDKI6yOynPsCvTPOwrsExq9+8sxD397pGw46/970D7IcrkP6h7/f8cEU6/DxNBBgck0u3AThQ1wy/AfLFMdeJQlPiuODWMXYy0rRAmQ1ATSMaAYz11bb1JD5TR2TK9PjgBmoYRHWQxVwbh++PAXDahHUHd2CpK8TiYEgGUFUFazMAz8n53u5mgbOoevMNlNmAhJ+uQP+nnXfLIAIwKgGmBGDbYaD7T1U49FIqEn00PbL3XbXfaJE22WfMXKYJSL5PAdZ3A2Vl+LzjIYz1MUmt4+CfUawuZ0w1BLmOMIj0Vesh135EfN7U95J/9z3InDevqDcb/dzNon/73k8PfLcMijDXZxSeu1/Urk1Q77ahFdml57w36tkn8nTqnSkS1oraENRxbseSAO7DdloUZap3CVGWiB0+yh3Mov9g/4iJmKzre5HZRfydZysphCYteB6OG1vYsGbRMmzp+7/f2E/hg0fnIe8UACjIXPVTTApBHcPvfhLqjS1Q8fjP8O97zvRpcQbVLy/EIvVGfbrch3F30C4tCJ/oDzKvmmGtAgAjJst2VJPkMGI21hRnQwcAp7bigcSJuHdlAbaVlGBbwUrcO3E87l3vmNGoyy7G+vt93iLjysQk4lfvOP7fCnRXI2/6WNyxeC02l5SgZPNaLL5jLG7Lcdx0UTGZsXNlMiS8rWJ0HyPzqaEBjqvhtNBcF+FaKGqNML2Eiu3dMM1bh4buBnzwwjJ88IJnCwX6+Ruw4yWT1/uVBUtM/FK8s+dvmJ/2K1g6O7F3Qzb2bujdRpOSD8s7S73er0wGQzLIjnz4quNMkXYKxvW9pvf8ObS0tPT54dc4Z3c+3YL+T58D91QHuWHXYYzBgK8AjLluWJhWGgPd7Ndx1PYUKrZsxe/f3oi9XwDAzUhZ+DMszrofxvGasIyARiQ9iY/a5qGm+G28sXEzPj5yDsD1mHTXfCx8+GfIuPW7kblTSJBE+T37O9B26m+46oaR0Gpi4PifZP4NP1XvUa7N24e2XycDABoKfohbll353S/M9bsRrv+gh4gcon5EVrXqJty70csTSib+e2ly2OshouCL8oP9wzEmsf+cGEU/H+Y96+H1+Gp2KYQQl/aoN6t3+SSiSIj6EdnEf/oLvlzQji8ONeEcrseYxHiMHX2j1McDiKi3KD9GRkRDQZTvWhLRUMAgIyLpMciISHoMMiKSHoOMiKTHICMi6THIiEh6DDIikh6DjIikxyAjIukxyIhIegwyIpIeg4yIpMcgIyLpMciISHoMMiKSHoOMiKTHICMi6THIiEh6DDIikh6DjIikxyAjIukxyIhIegwyIpIeg4yIpMcgIyLpMciISHoMMiKSHoOMiKTHICMi6THIiEh6DDIikh6DjIikxyAjIukxyIhIegwyIpIeg4yIpMcgIyLpMciISHoMMiKSHoOMiKTHICMi6THI/OrC2bNdkS6CiAbAIPPDvuc5GEbcgNg7F+Od2khXQ0S+XB3pAgavDuxc/zJs6AaOfA9x8ZGuh4h8if4gs3fC1n4eF/21GXYtRmo1iPH8WcNW5K/vBgAYn38Id8R4fSURDQLRH2RNWzDnlmWo8NfGaEb97qXQu35gx57frXS8RpuN/5el9/lSGty6zjbCergOrV0ArhmF+AQDxt54TWSL6jiOir0ncT52MmZOiQvvurvOotF6GHWODYJR8QkwjL0REd4iV05Eu3qzMAIC/h5Gs6i/1Nf4eWSXRuqXJaeephKRm6YVSr/PRxHatFxRfOJChCprF6XZOkct4ewoPU2iJDdNaJX+/VXRponc4hMiUlskGIbQwX4jzPUCQrgf9Wajl3YdKHttpf8RHA1q9roCpI+fiRd22dANQNHqYTAYEKcBgG7Ydr2AjIn34OXqM2Gu7AyqX56HjDWnwrtaex0K0sdj5gu7YHNsEOgNBhgcGwTdtl14IWMi7nm5GuHeIsEyhIIsMPaqV7BojQ2AgszfHcGXX355yY9/uyPSv8UQ1lGOVXcvQ1k3ACUJOcUn8HVbPY4dO4bTHRdw4r1HoVcAdJch5ye/QXlHmOrqOoJN90/BbTll6A7TKh06UL7qbixzbBAk5RTjxNdtqD92DMdOd+DCiffwqGODoCznJ/hN2DZIkEV4RBh6rt1EozDX933K2HvXsscq1hnVIXfCalHTE4mC6UrUrk1w7UIu3N7utU17yXKhVXerEtbWhriiC+JEca5I0Xg5DBGOXcvatSLBuQu5cLvwukXaS8RyrbPfrxWh3iKhwBGZix11Gx5BdgUA6JBX8Csk8kylZKrw9nOHHYuG5/D07BFeW42YsRKvZDiWDz/3NqpCVE3Hod/hoX+IxfczXsDeTsfPlKQkJIRofd5Uvf0cHFvEgOeeng2vW2TEDKx0bxC8HaoNEkIMMtXFug145HHHsF+XvQFPJDPFpHOoHG/ZHIvaB1OR6LNhHFLmqsdHbb9FcYj+cG1V6/H7WjXBoEFKbgmO7/tP3BWa1XlxCOXuDYJU3xsEcSlz4dgiNvw2VBskhKJ/+kVAKvDC3N87jqtol+HVxyaip6UFLYG+3Ns8NAq7jqZDsKrLP5ue5LetPn46FFSgGzbsqm0EkseGqCoF+tn/htde/Wf8eNw1AMpCtB4vOppwyL1B4HeL6OMxXQEqugHbrlo0Ihmh2iKhwCADABjxL9Uv4rENz6FE/xC+WnwTbrqU05b95qFRJLScdv7VGjBu1ABfK3HxmAagAsDnTW1ACP5sr0t8FmV1aTCOj9CXXMtpV7Abxo0aoIY4xLs3CEKzRUKHQeYUE4tpS1/FNDSgINK10GVptjp3ib6DG64doPGwYfiWumj7+nxI6hmVPBujQvLOAWq2uo7/fWfgDYJh7g2C0GyR0OExsn70WLq793yzvo+e1hJk69TmShLyVs/laEw2Y8fi1kjXMKiMxViJNwhHZJfIXrcFC+5+AFtPAdCkw7x3G5ZOkv4CDyKpMcgugb3ud7h32i9h6QSgy0LRzkLcH89D/ESRxiALUO8Qy0bJgZcwI5YhJq3GRtREuoZBpRGNEm8QHiMLgGeIKfoclH72CkNsEBoZZ1CX7Pi73/s2Abh4ET3qohIzLIRVRdDIOLi2yMAbBBfdGwSybREG2QAcx8TcIVZc9RJM3ieMU4SN0k1Ul6pgbR6gcXMdDqiL0yeOCWFVETRKB9cWGXiDoM69QSDbFhlCu5Z12PzSs6i/0f2Ts1V1/l9ypgzPOA/s67IHDLHass345qYM3DohCu7vJKFRE40wYBusACqtdYDJ9219GxsOqBdva5E2WaYZU5dg1EQYDcA2xwZBHUzwuUUaG3BAvZpdmzZZqjlkAIbSReOXcD8yIYRoLxU5esXxvC5blHq//tjDObF9YRgvBqb+eirFKufFzxmF4rTPhi2iKFNtpywRO86FsUZRKrLDdtF4j6hcpVX7eYYo9L1BREtRputi+yXh3SBBMYR2LbUwPbwCK1a4Hw+btN6b2utQMC8Dqxu6AcUE885AdidtaDmhrmnAyYcUEjHTMOth9TMtzkVBld1rM/uhjcjf5ljWPr0Edw8PU31hF4Npsx6GY4sUI7egCl63iP0QNro3CJbIuEEinaQhdym38RFCCHFO7MpR7+CpmMS62kDvm7lL5Kh33zS9eTJ49dOlOV0kMp13QdWkC3Ofz+9CbaHI0sH1+b7ZbygeauEckQkhxGlRlKnuWUAj0s21ve8Ee6FWFGbpXKMxU/g3SFAMoWNkgbFXvYIFq08BUJC5YSMeCXSya+uXOK4eY7h1rHRHGKJH3P1YX7wXn/5oDU51WrBscizyZs7FgqTRaKkuxHslzXDcj0KH7OJ38FDUX5IRh/vXF2Pvpz/CmlOdsCybjNi8mZi7IAmjW6pR+F4JmtUbdOiyi/GOrBsk0kkacufqxG6LRVgsu0XdgLv+58SOJUoAx1i82Jen3qyv/8iPwq1HtJbne7+ZISCgSRG5JU0iMvfNDPeIzKGntVzkp2h8HCfWiJTcEtEk8Y1Eo39ENnw8jDPHB9jYhuZj6rDqO9dDE/BKzqDsrfWwAYB2BhIl/VKLHjGI/ceV2NP+BI4f/Bi7du5FfScAzS1IuTsNd902HpqITQOMhXHFCigAMC02bGuNif1HrNzTjieOH8THu3Zir2OD4JaUu5F2120YH7kNEhRXCSFEpIsYPFqxZc5oPLANABRo0x7AL1NGD/iqll2vY4N6C1Bd3j40/PH+DTsAAAjLSURBVDqZ9yYjCiMGWR+9Lgq/RJr0N3Fg+0Pg5ZdE4cUg86brr6ip+gSf7DiELwNpr7kFptn3IO3W73IiLFEEMMiISHpDaEIsEUUrBhkRSY9BRkTSY5ARkfQYZEQkPQYZEUmPQUZE0mOQEZH0GGREJD0GGRFJj0FGRNJjkBGR9BhkRCQ9BhkRSY9BRkTSY5ARkfQYZEQkPQYZEUmPQUZE0mOQEZH0GGREJD0GGRFJj0FGRNJjkBGR9BhkRCQ9BhkRSY9BRkTSY5ARkfQYZEQkPQYZEUmPQUZE0mOQEZH0GGREJD0GGRFJj0FGRNJjkBGR9BhkRCQ9BhkRSY9BRkTSY5ARkfQYZEQkPQYZEUmPQUZE0mOQ2euw21KDxrNdka6EiC7TEA8yO+rW/xL/+OMpGDciHisr/xbpgojoMgzpILPXbcAj2RUAAF1OIZ6d/u0IV0REl+MqIYSIdBERYT+El6feipzDAKBAqx+LEVcH+Nrb8vDR5gcwNoTlEVHgAv3TjTJnUPbMbDXEAKAbtgYrbIG+/Dvf4GJoCqNgsnfi+MGPsWvnXtR3AtDcgpS703DXbeOhiQlnIV34a00VPvlkBw59CQA3IfGeO3Fn8q347jXhrKMP+yEU/etmfPa9e/DU46kYFcFSrtQQHJHZUVeQjsRlZeiGDksKCpClG/hVZ//8LB54uRoAoMvZhc9fSsXwEFdKl8uOtr+sxk9mrcLeTi9Pa1KQ++5W/PsMHUKdZ11HNmHJfUuwuaG7/5OKHvNfexv/vTQJI0JcR39nUPbUFPxozSnAaEb97qXQh72GIBJDSo+wmk1CAQSgCJPZKnoCeZXVLEwKBAChmMzCGsiLKGLaS7OFDo7PC9CIyXOWixUrVohFM+OExvVznch6pz6kdXj2G0AR2rRFYsWKFWL5nCSh9fh5Uv4+cS6klfQtrFWU5yWpfwcQMJpFaLdE6A2hIOsR1jfTXR1Zl10q2gN5WXupyNYxxKRxukhkOkNCt0S8d+JCr6cv1BaKLPXzhJIpik6HqI6efSLPtR6TWL2/tdeXZk9ruchLUlyhmrcvPB2rp3W/WJuuUdfLIJNMuzi42uT6Brr+wY2i9tSX4ssvB3icLHaFGK5/ULx1zEe7MxcGLoHCoEdUrtIOGA49NatFgtoXtKsqAxqVX6rThRmuEVemr7Rs3y4WOkM3o1CEKlOFEEL0tIr95vlCr6B3iDHIJNHTJLY/qncPo0PxyC6N9G9JQgjRUyqylUD+OM+J7Qudo6VsURr0JKsXZmNg71+z2qD2I6MwhypNPv9fkaLp3Wc16Y+KhYboCbLoPmv5t4NY/cPpeKbacaBVp9Ph1KlTgKKFfuwIn6ds/36mEQ227sDb0eBQXY5N6sdhyEz2c/B6OG6flQls3AZ0r0PxnldgSg1iHa0H8GGFupx1F6b6OaOQmPogtMiDDRXYXN6IpfoQTOppO+o+6aHoMf+1Iryy6Eb8ybQOG63BX10kRHeQfTsBi158Ev9776sYnluGnalbMfJHa4Bpz8Pi5yxNQ8EPccuyisDb0aDQevIz1xSajKmJftuOGjcFWmyDDd2orGsAUoN4zu6LY3D2CmPqRP9nt2+eACOAbQDKDp0EQjY7UYOU3A0oyJ6NybExABpCtJ7IiO4gQwxiZ7yEqqanoYmNRUzZ1kgXRCHUeuqoupQMQ9wAjUfGIR6ADcCBumYgiJMPOtoaXIF6681j/DceFeuOrppGNCIEURb/JOo6XsL48E6eC6soDzKHEbGxkS6BwqC92bmfFIOrhw3Q+DoNtOpitz2405ttLSdcy9+6eqDwuBY3aOFI1J6LoZloHTcO40PxvoPIkAiyfuzn8HVLC1p8PP31OfultSP5eI6EImok4pxDQ7psQzPIqp7GtJueDl47IoqooRlkPGtJFFWGZpDxrCU1NGB/pGsAADShll3oig3p+5FRdIkzJKtLX+Hr84G/TnvDtUGtY8zNt7qW2893XEIhGlwX1EqGDgYZRY3R3zOoS1acbB3gRExTrXuu14Sbg1pHzKhxcFVy2tepIpXHyFA7ZZzUt9KJpKG5a8mzllFpePx0JGMjqgB8XHMUmOF7UmydtVJdMsA4McjxMfFW3AXACqCqsg4dj8T7nBRrP1HtCtQfT50Q3DqGkKEZZDxrGZ3ip+NnBqDKClj/8AmOPJ2ISV4bNqC8sMqxqJ0H48Qg1xHzA2QsBF7fCGDrNlS8NhsZXpPMjt071qvLGUhP4h3uLhd3LSmKJOKe5QmOxcPP4DcfnPHa6syf/wsr1WFQwr/Mxx1Bn/A+HKkPLoECAN3rseKNQ/A2drfXrce/rnGc9VaWPIF7B7oagXwamkFmNKPeceePK3rUm42R/k2oj0mLX0OODgC6sfHeKVj6/km4/6O/Lpx8fzmS733dMf9Ul421C7yP2a7U8IxcrDMpAIDDOclIf/kA2lxpZkfbgVdx77RHHbuVigmvPXM37zh8JcJ/w40IunDGcf+wto6g3IOqp6ON9yMbhHrfmRUCmjhhMBhEnOetbBSTMIf6LpkeN+V0rFMr9AaD0GsVj1vq6ER2aUC3+Awyj1sNRcFtfIZWkNGQ0dNUInLTtF7uQ6cIbVquKD4Rpi+f9oPCvHiyxy22Pe4JNnmxMPe5c2z4RFeQRfl/PtKB4xV7cfIS5hRduWsxLsWI8dxPGBS6zjbCergOrV0ArhmF+AQDxt4Y/v+6yN5pQ8ORajSdA4DrMSZpEvRaTcj/8xPfPP42rh2HFON4qXdtozzIGlDww1sQ3sn3Rpjrd2Op1P8lDZFconz6xTBcN8YAw1fhXOcYXDfQLWSIKKiifERGREPB0Jx+QURRhUFGRNJjkBGR9BhkRCQ9BhkRSY9BRkTSY5ARkfQYZEQkPQYZEUmPQUZE0mOQEZH0GGREJD0GGRFJj0FGRNJjkBGR9BhkRCQ9BhkRSY9BRkTSY5ARkfQYZEQkPQYZEUmPQUZE0mOQEZH0GGREJD0GGRFJj0FGRNL7/7ac9yWUMR/tAAAAAElFTkSuQmCC"
    }
   },
   "cell_type": "markdown",
   "id": "3a4c6283",
   "metadata": {},
   "source": [
    "![image.png](attachment:image.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25d7c53b",
   "metadata": {},
   "source": [
    "在独热编码中，每个离散特征都会被映射成一个二进制向量。在猫吃鱼这个例子中，第i个二进制位为1，表示该词是词表的第i个词。猫是第一个词，我们可以用[1, 0, 0]来表示它，吃是第二个词，就可以用[0, 1, 0]来表示，同理可以用[0, 0, 1]来表示第三个词鱼。\n",
    "\n",
    "比如我们要创建一个索引为1的独热向量就可以用下面的代码实现。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "276b1fda",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1, 0, 0,  ..., 0, 0, 0]])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch\n",
    "from torch.nn import functional as F\n",
    "F.one_hot(torch.tensor([0]), len(vocab))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c51bd4e4",
   "metadata": {},
   "source": [
    "**梗直哥提示：本节我们学习了文本数据的预处理方法，经过预处理后的文本，就变成了模型能够计算的数据。更加详细的知识点的掌握，有赖于你在实战中总结经验，慢慢就熟悉了。当然，如果你想大幅节省时间，解答自己在学习中的各种困惑，欢迎选修《梗直哥深度学习：python实战》。**"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
